Search CORE

Harvard University - DASH

Fast Association Tests for Genes with FAST

Author: Arking Dan E.
Bader Joel S.
Chanda Pritam
Huang Hailiang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/07/2013
Field of study

Gene-based tests of association can increase the power of a genome-wide association study by aggregating multiple independent effects across a gene or locus into a single stronger signal. Recent gene-based tests have distinct approaches to selecting which variants to aggregate within a locus, modeling the effects of linkage disequilibrium, representing fractional allele counts from imputation, and managing permutation tests for p-values. Implementing these tests in a single, efficient framework has great practical value. Fast ASsociation Tests (Fast) addresses this need by implementing leading gene-based association tests together with conventional SNP-based univariate tests and providing a consolidated, easily interpreted report. Fast scales readily to genome-wide SNP data with millions of SNPs and tens of thousands of individuals, provides implementations that are orders of magnitude faster than original literature reports, and provides a unified framework for performing several gene based association tests concurrently and efficiently on the same data. Availability: https://bitbucket.org/baderlab/fast/downloads/FAST.tar.gz, with documentation at https://bitbucket.org/baderlab/fast/wiki/Hom

CiteSeerX

FigShare

Information-theoretic gene-gene and gene-environment interaction analysis of quantitative traits

Author: Chanda Pritam
Liu Song
Ramanathan Murali
Sucheston Lara
Zhang Aidong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The purpose of this research was to develop a novel information theoretic method and an efficient algorithm for analyzing the gene-gene (GGI) and gene-environmental interactions (GEI) associated with quantitative traits (QT). The method is built on two information-theoretic metrics, the <it>k</it>-way interaction information (KWII) and phenotype-associated information (PAI). The PAI is a novel information theoretic metric that is obtained from the total information correlation (TCI) information theoretic metric by removing the contributions for inter-variable dependencies (resulting from factors such as linkage disequilibrium and common sources of environmental pollutants). Results The KWII and the PAI were critically evaluated and incorporated within an algorithm called CHORUS for analyzing QT. The combinations with the highest values of KWII and PAI identified each known GEI associated with the QT in the simulated data sets. The CHORUS algorithm was tested using the simulated GAW15 data set and two real GGI data sets from QTL mapping studies of high-density lipoprotein levels/atherosclerotic lesion size and ultra-violet light-induced immunosuppression. The KWII and PAI were found to have excellent sensitivity for identifying the key GEI simulated to affect the two quantitative trait variables in the GAW15 data set. In addition, both metrics showed strong concordance with the results of the two different QTL mapping data sets. Conclusion The KWII and PAI are promising metrics for analyzing the GEI of QT.</p

Springer - Publisher Connector

Comparison of information-theoretic to statistical methods for gene-gene interactions in the presence of genetic heterogeneity

Author: Chanda Pritam
Ramanathan Murali
Sucheston Lara
Tritchler David
Zhang Aidong
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Multifactorial diseases such as cancer and cardiovascular diseases are caused by the complex interplay between genes and environment. The detection of these interactions remains challenging due to computational limitations. Information theoretic approaches use computationally efficient directed search strategies and thus provide a feasible solution to this problem. However, the power of information theoretic methods for interaction analysis has not been systematically evaluated. In this work, we compare power and Type I error of an information-theoretic approach to existing interaction analysis methods. Methods The <it>k-</it>way interaction information (KWII) metric for identifying variable combinations involved in gene-gene interactions (GGI) was assessed using several simulated data sets under models of genetic heterogeneity driven by susceptibility increasing loci with varying allele frequency, penetrance values and heritability. The power and proportion of false positives of the KWII was compared to multifactor dimensionality reduction (MDR), restricted partitioning method (RPM) and logistic regression. Results The power of the KWII was considerably greater than MDR on all six simulation models examined. For a given disease prevalence at high values of heritability, the power of both RPM and KWII was greater than 95%. For models with low heritability and/or genetic heterogeneity, the power of the KWII was consistently greater than RPM; the improvements in power for the KWII over RPM ranged from 4.7% to 14.2% at for α = 0.001 in the three models at the lowest heritability values examined. KWII performed similar to logistic regression. Conclusions Information theoretic models are flexible and have excellent power to detect GGI under a variety of conditions that characterize complex diseases.</p

Springer - Publisher Connector

Information Theory in Computational Biology: Where We Stand Today

Author: Chanda Pritam
Costa Eduardo
Hu Jie
Sukumar Shravan
Van Hemert John
Walia Rasna
Publication venue: 'MDPI AG'
Publication date: 01/06/2020
Field of study

"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis

IUPUIScholarWorks

HapZipper: sharing HapMap populations just got easier

Author: Ahn
Altshuler
Brandon
Burrows
Christley
Dublin
Eran Elhaik
Joel S. Bader
Levy
Pritam Chanda
Sansom
Schuster
Service
The 1000 Genomes Project Consortium
Wang
Willyard
Ziv
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2011
Field of study

The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing H ap Z ipper , a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip , bzip2 and lzma . We demonstrate the usefulness of H ap Z ipper by compressing HapMap 3 populations to <5% of their original sizes. H ap Z ipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz

CiteSeerX

Lund University Publications

White Rose Research Online

Cheminformatics and artificial intelligence for accelerating agrochemical discovery

Author: Avery Sader
Dirk Tomandl
Elizabeth Shipp
Jeremy Wilmot
Jie Hu
John Kinney
Junjun Ou
Max Sharifi
Pritam Chanda
Pulan Yu
Scott Smith
Siva P. Kumpatla
Yannick Djoumbou-Feunang
Publication venue: Frontiers Media S.A.
Publication date: 01/11/2023
Field of study

The global cost-benefit analysis of pesticide use during the last 30 years has been characterized by a significant increase during the period from 1990 to 2007 followed by a decline. This observation can be attributed to several factors including, but not limited to, pest resistance, lack of novelty with respect to modes of action or classes of chemistry, and regulatory action. Due to current and projected increases of the global population, it is evident that the demand for food, and consequently, the usage of pesticides to improve yields will increase. Addressing these challenges and needs while promoting new crop protection agents through an increasingly stringent regulatory landscape requires the development and integration of infrastructures for innovative, cost- and time-effective discovery and development of novel and sustainable molecules. Significant advances in artificial intelligence (AI) and cheminformatics over the last two decades have improved the decision-making power of research scientists in the discovery of bioactive molecules. AI- and cheminformatics-driven molecule discovery offers the opportunity of moving experiments from the greenhouse to a virtual environment where thousands to billions of molecules can be investigated at a rapid pace, providing unbiased hypothesis for lead generation, optimization, and effective suggestions for compound synthesis and testing. To date, this is illustrated to a far lesser extent in the publicly available agrochemical research literature compared to drug discovery. In this review, we provide an overview of the crop protection discovery pipeline and how traditional, cheminformatics, and AI technologies can help to address the needs and challenges of agrochemical discovery towards rapidly developing novel and more sustainable products

Public Library of Science (PLOS)

Gene-Based Tests of Association

Author: A Pfeufer
A Pfeufer
A Subramanian
A Wille
AL Dixon
Alvaro Alonso
B Efron
B Servin
B Vrtovec
BE Stranger
BL Fridley
BL Fridley
BM Neale
C Newton-Cheh
C Verzilli
D Lindley
Dan E. Arking
DE Arking
DE Knuth
DH Ballard
DH Wolpert
DR Nyholt
EG Schouten
EI George
EW Sayers
F Grigioni
G Schwarz
GA Churchill
Hailiang Huang
IA Adzhubei
J Chapman
J Li
JB Veyrieras
JM Cheverud
Joel S. Bader
JZ Liu
K Wang
M Bartlett
M Bogdan
M Cline
M Holden
M Stephens
Mark I. McCarthy
N Sotoodehnia
NW Galwey
P Turrini
Pritam Chanda
R Saxena
R Tibshirani
RA Fisher
RD Ball
S Cheng
S Purcell
T Hastie
T Wu
TT Wu
VA McKusick
Publication venue: Public Library of Science
Publication date: 01/07/2011
Field of study

Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%–50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis

CiteSeerX